This project analyzes the stock markets of major European semiconductor companies. The goal of the project is to retrive financial data from yfinance and use it to forecast stock markets of the companies with time series analysis and machine learning. The results can be applied to trading and finacial decisionmaking. Note that this project itself does not provide such decisionmaking. It serves only as a general analysis and guideline.
Initially this project also featured an attempt at forecasting stock close prices with a hybrid LSTM-ARIMA model (inspired by this paper), but after many failed attempts, it was scrapped. The original paper doesn’t use the model for time series forecast, but instead for trend and buy/sell signal detection. There are many other examples of LSTM being used for stock data forecast, but for a very volatile market, it might not be the best fit.
Data cleansing
Code
import yfinance as yfimport randomimport matplotlib.pyplot as pltimport matplotlib.cm as cmimport seaborn as snsimport pandas as pdimport numpy as npimport plotly.graph_objects as goimport plotly.express as pximport timefrom datetime import datetime, timedeltatickers = ["ASML.AS", "NXPI", "IFX.DE", "BESI.AS","NOD.OL", "MELE.BR", "AIXA.DE", "SMHN.DE", "AWEVF"]all_data = {}yesterday = datetime.today() - timedelta(days=1)yesterday_str = yesterday.strftime('%Y-%m-%d')#Fetch data in relative time to get reliable resultsfor ticker in tickers:for attempt inrange(3):try: stock = yf.Ticker(ticker) hist = stock.history(period="max", end=yesterday_str)if hist isNoneor hist.empty: display(f"No data for {ticker}, attempt {attempt+1}") time.sleep(2)continue all_data[ticker] = histbreakexceptExceptionas e: display(f"Error fetching {ticker}: {e}, attempt {attempt+1}") time.sleep(2)#Check out ASML data as a testif"ASML.AS"in all_data: display("ASML stocks tail") display(all_data["ASML.AS"].tail())else: display("ASML.AS data not available")#Clean and processed data for continuous time seriesprocessed_data = {}for ticker, df in all_data.items():if df.empty:continue df.index = df.index.tz_localize(None) df_continuous = df.asfreq('D') cols_to_ffill = ['Open', 'High', 'Low', 'Close', 'Adj Close'] existing_cols = [c for c in cols_to_ffill if c in df_continuous.columns] df_continuous[existing_cols] = df_continuous[existing_cols].ffill()if'Volume'in df_continuous.columns: df_continuous['Volume'] = df_continuous['Volume'].fillna(0) processed_data[ticker] = df_continuous
'ASML stocks tail'
Open
High
Low
Close
Volume
Dividends
Stock Splits
Date
2026-02-09 00:00:00+01:00
1200.000000
1205.000000
1177.400024
1204.800049
456425
1.6
0.0
2026-02-10 00:00:00+01:00
1196.000000
1212.400024
1185.800049
1193.000000
459476
0.0
0.0
2026-02-11 00:00:00+01:00
1185.400024
1224.000000
1176.599976
1207.800049
530632
0.0
0.0
2026-02-12 00:00:00+01:00
1225.000000
1225.000000
1176.599976
1179.800049
558696
0.0
0.0
2026-02-13 00:00:00+01:00
1190.599976
1210.599976
1173.800049
1190.400024
708101
0.0
0.0
Line chart plot
After cleaning and processing the data, the next step is to visualize the stock markets in a clean line chart. Plotly offers some of the cleanest and most interactive visualization for this. There are downsides for using plotly however, the main ones being memory-heaviness and slowness. That is why it’s not recommended to use plotly for large data analytics.
Code
fig = go.Figure()for ticker, data in processed_data.items(): fig.add_trace( go.Scatter( x=data.index, y=data['Close'], mode='lines', name=f"{ticker} Close" ) )fig.update_layout( title="European Semiconductor Companies - Close Prices", xaxis_title="Time", yaxis_title="Close Price (€ or $ depending on listing)", legend_title="Company")fig.show()
Figure 1: Time series line plot
Also line chart plot of last 500 days.
Code
fig = go.Figure()for ticker, data in processed_data.items(): data = data.tail(500)print(data['Close'].min(), data['Close'].max()) fig.add_trace( go.Scatter( x=data.index, y=data['Close'], mode='lines', name=f"{ticker} Close" ) )fig.update_layout( title="European Semiconductor Companies - Close Prices of last 500 days", xaxis_title="Time", yaxis_title="Close Price (€ or $ depending on listing)", legend_title="Company")fig.show()
Next is the analysis of MACD. MACD (Moving Average Convergence Divergence) is a commonly used test in financial statistics and trading. It reveals general trends in the stocks for buying and selling. It’s a really important step in stock market analysis. It’s recommended to zoom in the plot to see the MACD results and candlestick plot better.
ASML.AS: No Crossover → Bearish Trend
NXPI: No Crossover → Bullish Trend
IFX.DE: Cross Above Signal Line → Potential Bullish Signal
BESI.AS: No Crossover → Bearish Trend
NOD.OL: No Crossover → Bullish Trend
MELE.BR: No Crossover → Bearish Trend
AIXA.DE: No Crossover → Bullish Trend
SMHN.DE: No Crossover → Bearish Trend
AWEVF: No Crossover → Bullish Trend
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Figure 3
RSI analysis
The next technical indicator analysis is RSI (Relative strength index). The indicator helps to indentify the oveerbought and oversold trends and buy and sell signals. Using both RSI and MACD is the most optimal way to figure out stock market trends for trading.
The GARCH (Generalized Autoregressive Conditional Heteroscedasticity) model is a popular statistical model for time series analysis, especially in trading and quantitative finance. The main application of ARCH in finance is to examine and forecast the market volatility. This is especially important for volatile and risk-averse markets like semiconductor market. The Q-Q-plot is an important sanity check for the market data. From the Q-Q-plot you can tell if the data aligns with a standard probablity distribution. Straight line means aligning with the distribution.
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
(e)
(f)
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
8.343298
8.432530
8.521762
8.610994
8.700226
2026-02-11
8.034492
8.123724
8.212956
8.302188
8.391420
2026-02-12
7.965423
8.054655
8.143887
8.233119
8.322351
2026-02-13
7.635522
7.724754
7.813986
7.903218
7.992450
(g)
(h)
(i)
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-8841.68
Distribution:
Normal
AIC:
17691.4
Method:
Maximum Likelihood
BIC:
17716.4
No. Observations:
3901
Date:
Mon, Feb 16 2026
Df Residuals:
3900
Time:
14:52:28
Df Model:
1
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.1002
2.611e-02
3.837
1.247e-04
[4.901e-02, 0.151]
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
8.9343e-03
6.623e-03
1.349
0.177
[-4.046e-03,2.191e-02]
alpha[1]
0.0722
1.227e-02
5.883
4.019e-09
[4.814e-02,9.624e-02]
beta[1]
0.9278
1.426e-02
65.059
0.000
[ 0.900, 0.956]
Covariance estimator: robust
(j)
(k)
'Fixed results:'
(l)
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-243490.
Distribution:
Normal
AIC:
486989.
Method:
User-specified Parameters
BIC:
487014.
No. Observations:
3904
Date:
Mon, Feb 16 2026
Time:
14:52:29
coef
mu
0.0235
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
(m)
(n)
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
6.912220
6.921321
6.930421
6.939523
6.948624
2026-02-11
8.565670
8.574810
8.583951
8.593091
8.602232
2026-02-12
8.662438
8.671581
8.680723
8.689867
8.699010
2026-02-13
8.095315
8.104444
8.113573
8.122703
8.131832
(o)
(p)
(q)
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-15452.4
Distribution:
Normal
AIC:
30912.8
Method:
Maximum Likelihood
BIC:
30940.0
No. Observations:
6621
Date:
Mon, Feb 16 2026
Df Residuals:
6620
Time:
14:52:30
Df Model:
1
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.0881
2.783e-02
3.168
1.537e-03
[3.361e-02, 0.143]
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
0.0664
2.557e-02
2.595
9.458e-03
[1.624e-02, 0.116]
alpha[1]
0.0540
1.119e-02
4.826
1.396e-06
[3.208e-02,7.596e-02]
beta[1]
0.9370
1.325e-02
70.709
0.000
[ 0.911, 0.963]
Covariance estimator: robust
(r)
(s)
'Fixed results:'
(t)
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-502676.
Distribution:
Normal
AIC:
1.00536e+06
Method:
User-specified Parameters
BIC:
1.00539e+06
No. Observations:
6624
Date:
Mon, Feb 16 2026
Time:
14:52:30
coef
mu
0.0235
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
(u)
(v)
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
4.625270
4.650100
4.674707
4.699094
4.723261
2026-02-11
4.508649
4.534527
4.560172
4.585586
4.610773
2026-02-12
4.472842
4.499041
4.525004
4.550735
4.576234
2026-02-13
4.411641
4.438389
4.464897
4.491167
4.517201
(w)
(x)
(y)
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-16940.8
Distribution:
Normal
AIC:
33889.6
Method:
Maximum Likelihood
BIC:
33917.1
No. Observations:
7093
Date:
Mon, Feb 16 2026
Df Residuals:
7092
Time:
14:52:31
Df Model:
1
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.1554
2.890e-02
5.377
7.583e-08
[9.874e-02, 0.212]
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
0.0700
2.654e-02
2.640
8.300e-03
[1.804e-02, 0.122]
alpha[1]
0.0483
9.928e-03
4.865
1.146e-06
[2.884e-02,6.775e-02]
beta[1]
0.9446
1.167e-02
80.912
0.000
[ 0.922, 0.967]
Covariance estimator: robust
(z)
({)
'Fixed results:'
(|)
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-578098.
Distribution:
Normal
AIC:
1.15620e+06
Method:
User-specified Parameters
BIC:
1.15623e+06
No. Observations:
7096
Date:
Mon, Feb 16 2026
Time:
14:52:32
coef
mu
0.0235
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
(})
(~)
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
8.258731
8.270012
8.281212
8.292332
8.303374
2026-02-11
7.913731
7.927467
7.941104
7.954645
7.968089
2026-02-12
7.645788
7.661430
7.676960
7.692381
7.707691
2026-02-13
8.181166
8.192998
8.204747
8.216411
8.227993
()
()
()
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-17462.0
Distribution:
Normal
AIC:
34931.9
Method:
Maximum Likelihood
BIC:
34959.1
No. Observations:
6620
Date:
Mon, Feb 16 2026
Df Residuals:
6619
Time:
14:52:33
Df Model:
1
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.1177
4.077e-02
2.887
3.892e-03
[3.779e-02, 0.198]
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
1.1929
0.892
1.337
0.181
[ -0.556, 2.941]
alpha[1]
0.0587
3.394e-02
1.730
8.364e-02
[-7.806e-03, 0.125]
beta[1]
0.8456
9.160e-02
9.231
2.677e-20
[ 0.666, 1.025]
Covariance estimator: robust
()
()
'Fixed results:'
()
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-2.62349e+06
Distribution:
Normal
AIC:
5.24699e+06
Method:
User-specified Parameters
BIC:
5.24702e+06
No. Observations:
6623
Date:
Mon, Feb 16 2026
Time:
14:52:33
coef
mu
0.0235
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
( )
()
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
16.765401
16.353632
15.981273
15.644555
15.340064
2026-02-11
15.477568
15.189060
14.928166
14.692242
14.478899
2026-02-12
14.478290
14.285425
14.111019
13.953307
13.810689
2026-02-13
14.618438
14.412159
14.225624
14.056942
13.904405
()
()
()
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-12943.7
Distribution:
Normal
AIC:
25895.3
Method:
Maximum Likelihood
BIC:
25922.2
No. Observations:
6091
Date:
Mon, Feb 16 2026
Df Residuals:
6090
Time:
14:52:34
Df Model:
1
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.1293
2.716e-02
4.761
1.931e-06
[7.607e-02, 0.183]
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
0.1547
8.297e-02
1.864
6.234e-02
[-7.973e-03, 0.317]
alpha[1]
0.0836
2.840e-02
2.942
3.262e-03
[2.789e-02, 0.139]
beta[1]
0.8868
4.154e-02
21.347
4.173e-101
[ 0.805, 0.968]
Covariance estimator: robust
()
()
'Fixed results:'
()
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-293220.
Distribution:
Normal
AIC:
586447.
Method:
User-specified Parameters
BIC:
586474.
No. Observations:
6094
Date:
Mon, Feb 16 2026
Time:
14:52:34
coef
mu
0.0235
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
()
()
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
11.456504
11.271090
11.091180
10.916611
10.747223
2026-02-11
10.554558
10.395918
10.241986
10.092623
9.947694
2026-02-12
9.737066
9.602691
9.472304
9.345789
9.223028
2026-02-13
8.818165
8.711066
8.607147
8.506312
8.408470
()
()
()
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-18540.8
Distribution:
Normal
AIC:
37089.6
Method:
Maximum Likelihood
BIC:
37117.0
No. Observations:
6971
Date:
Mon, Feb 16 2026
Df Residuals:
6970
Time:
14:52:35
Df Model:
1
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.0638
4.071e-02
1.567
0.117
[-1.599e-02, 0.144]
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
0.0851
6.370e-02
1.336
0.182
[-3.975e-02, 0.210]
alpha[1]
0.0276
1.085e-02
2.547
1.087e-02
[6.372e-03,4.892e-02]
beta[1]
0.9669
1.469e-02
65.833
0.000
[ 0.938, 0.996]
Covariance estimator: robust
()
()
'Fixed results:'
()
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-730736.
Distribution:
Normal
AIC:
1.46148e+06
Method:
User-specified Parameters
BIC:
1.46151e+06
No. Observations:
6974
Date:
Mon, Feb 16 2026
Time:
14:52:35
coef
mu
0.0235
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
()
()
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
15.372814
15.374587
15.376351
15.378105
15.379850
2026-02-11
15.007428
15.011183
15.014916
15.018630
15.022324
2026-02-12
14.901056
14.905387
14.909694
14.913978
14.918239
2026-02-13
15.621520
15.621945
15.622368
15.622788
15.623207
()
()
()
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-18235.1
Distribution:
Normal
AIC:
36478.1
Method:
Maximum Likelihood
BIC:
36505.5
No. Observations:
6834
Date:
Mon, Feb 16 2026
Df Residuals:
6833
Time:
14:52:36
Df Model:
1
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.1177
4.038e-02
2.913
3.575e-03
[3.850e-02, 0.197]
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
0.1294
9.690e-02
1.335
0.182
[-6.054e-02, 0.319]
alpha[1]
0.0407
1.935e-02
2.105
3.533e-02
[2.798e-03,7.865e-02]
beta[1]
0.9509
2.491e-02
38.167
0.000
[ 0.902, 1.000]
Covariance estimator: robust
()
()
'Fixed results:'
()
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-683726.
Distribution:
Normal
AIC:
1.36746e+06
Method:
User-specified Parameters
BIC:
1.36749e+06
No. Observations:
6837
Date:
Mon, Feb 16 2026
Time:
14:52:36
coef
mu
0.0235
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
()
()
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
15.527729
15.526322
15.524927
15.523544
15.522172
2026-02-11
14.966844
14.970162
14.973451
14.976713
14.979947
2026-02-12
15.737396
15.734223
15.731077
15.727957
15.724864
2026-02-13
15.601637
15.599608
15.597596
15.595600
15.593622
()
( )
(¡)
Dep. Variable:
Close
R-squared:
0.000
Mean Model:
Constant Mean
Adj. R-squared:
0.000
Vol Model:
GARCH
Log-Likelihood:
-1144.35
Distribution:
Normal
AIC:
2296.69
Method:
Maximum Likelihood
BIC:
2312.37
No. Observations:
372
Date:
Mon, Feb 16 2026
Df Residuals:
371
Time:
14:52:37
Df Model:
1
coef
std err
t
P>|t|
95.0% Conf. Int.
mu
0.4701
4.393
0.107
0.915
[ -8.141, 9.081]
coef
std err
t
P>|t|
95.0% Conf. Int.
omega
3.6103
203.967
1.770e-02
0.986
[-3.962e+02,4.034e+02]
alpha[1]
0.3482
7.219
4.823e-02
0.962
[-13.801, 14.497]
beta[1]
0.6518
4.283
0.152
0.879
[ -7.742, 9.046]
Covariance estimator: robust
(¢)
(£)
'Fixed results:'
(¤)
Dep. Variable:
Close
R-squared:
--
Mean Model:
Constant Mean
Adj. R-squared:
--
Vol Model:
GARCH
Log-Likelihood:
-125551.
Distribution:
Normal
AIC:
251110.
Method:
User-specified Parameters
BIC:
251126.
No. Observations:
375
Date:
Mon, Feb 16 2026
Time:
14:52:37
coef
mu
0.0235
coef
omega
0.0100
alpha[1]
0.0600
beta[1]
0.0000
Results generated with user-specified parameters. Std. errors not available when the model is not estimated,
(¥)
(¦)
h.1
h.2
h.3
h.4
h.5
Date
2026-02-10
10.592766
14.203046
17.813326
21.423607
25.033887
2026-02-11
10.591889
14.202169
17.812450
21.422730
25.033010
2026-02-12
10.591318
14.201598
17.811878
21.422159
25.032439
2026-02-13
10.590945
14.201226
17.811506
21.421786
25.032066
(§)
(¨)
Figure 5: Volatility Model
Statistical checks
The next analysis checks the skewness and mode of the stock market data among other statistical measures. These are important for detailed understanding of the stock markets. Some analysts have debated that positive skewness is a good indicator for buying. The other statistical measures such as mean and standard deviation are also important for stock market analysis and can help with buy/sell decisions.
Code
from scipy.stats import skew, modefor ticker, data in all_data.items(): close_values = data['Close'] close_skewness = skew(close_values) close_mean = np.mean(close_values) close_std = np.std(close_values) close_median = np.median(close_values) display(f"Skewness of {ticker}:", close_skewness) pearson_skewness = (3* (close_mean - close_median)) / close_std display(f"Pearson's Second Skewnes of {ticker}:", pearson_skewness) mode_val = close_values.mode().iloc[0] plt.figure(figsize=(8, 5)) sns.kdeplot(close_values) plt.axvline(close_mean, label="Mean") plt.axvline(close_median, color="black", label="Median") plt.axvline(mode_val, color="green", label="Mode") plt.title(f"Distribution of {ticker} Close Prices (Skewness)") plt.xlabel("Price") plt.legend() plt.show()
'Skewness of ASML.AS:'
(a)
1.725841310691759
(b)
"Pearson's Second Skewnes of ASML.AS:"
(c)
1.488635726001992
(d)
(e)
'Skewness of NXPI:'
(f)
0.43628742157245304
(g)
"Pearson's Second Skewnes of NXPI:"
(h)
0.7078727084879803
(i)
(j)
'Skewness of IFX.DE:'
(k)
1.01038828468114
(l)
"Pearson's Second Skewnes of IFX.DE:"
(m)
1.3985482157427367
(n)
(o)
'Skewness of BESI.AS:'
(p)
2.0837344454691125
(q)
"Pearson's Second Skewnes of BESI.AS:"
(r)
1.544777113586909
(s)
(t)
'Skewness of NOD.OL:'
(u)
1.8909029781027362
(v)
"Pearson's Second Skewnes of NOD.OL:"
(w)
1.3279410677139278
(x)
(y)
'Skewness of MELE.BR:'
(z)
0.46158505120667875
({)
"Pearson's Second Skewnes of MELE.BR:"
(|)
1.3161214636273015
(})
(~)
'Skewness of AIXA.DE:'
()
3.2073192928040757
()
"Pearson's Second Skewnes of AIXA.DE:"
()
0.9586018786250885
()
()
'Skewness of SMHN.DE:'
()
1.6805232399960512
( )
"Pearson's Second Skewnes of SMHN.DE:"
()
1.2913957734430643
()
()
'Skewness of AWEVF:'
()
0.1577176152108814
()
"Pearson's Second Skewnes of AWEVF:"
()
0.5724640039028525
()
()
Figure 6
XGBoost
XGBoost is the first of the ML models used in this project. XGBoost is one of the most popular gradient boosting implementations and fits expectionally well when analyzing time series data. XGBoost is a quite complicated model, so it’s easier to understand the results rather than the model itself. The time series line plot for these models includes only 200 days of historical data for easier visualization.
Code
import xgboost as xgbfrom sklearn.metrics import mean_squared_errorcolors = px.colors.qualitative.Alphabet#First it's important to go through the data and separate each feature for trainingdef create_features(df, label=None): df = df.copy() df['date'] = df.index df['date'] = pd.to_datetime(df['date']) df['hour'] = df['date'].dt.hour df['dayofweek'] = df['date'].dt.dayofweek df['quarter'] = df['date'].dt.quarter df['month'] = df['date'].dt.month df['year'] = df['date'].dt.year df['dayofyear'] = df['date'].dt.dayofyear df['dayofmonth'] = df['date'].dt.day df['weekofyear'] = df['date'].dt.isocalendar().week X = df[['hour','dayofweek','quarter','month','year','dayofyear','dayofmonth','weekofyear']]if label: y = df[label]return X, yreturn Xfig = go.Figure()for i, (ticker, data) inenumerate(processed_data.items()): current_color = colors[i %len(colors)] data = data.sort_index() split_date ='10-Feb-2026' stock_train = data.loc[data.index <= split_date].copy() stock_test = data.loc[data.index > split_date].copy() X_train, y_train = create_features(stock_train, label='Close') X_test, y_test = create_features(stock_test, label='Close') reg = xgb.XGBRegressor(n_estimators=1000, early_stopping_rounds=50) reg.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_test, y_test)], verbose=False) forecast_periods =50 data_recent = data.tail(500).copy() data_recent.index = pd.to_datetime(data_recent.index) data_recent = data_recent.sort_index() hist_x = data_recent.index future_start = hist_x[-1] + pd.Timedelta(days=1) future_dates = pd.date_range(start=future_start, periods=forecast_periods, freq='B') future_df = pd.DataFrame(index=future_dates) X_future = create_features(future_df) forecast = reg.predict(X_future) last_hist_date = data_recent.index[-1] last_hist_close = data_recent['Close'].iloc[-1] plot_forecast_dates = pd.Index([last_hist_date]).append(future_dates) plot_forecast_values = np.concatenate(([last_hist_close], forecast)) fig.add_trace(go.Scatter( x=data_recent.index, y=data_recent['Close'], mode='lines', name=f'Historical Market Close of {ticker}', line=dict(color=current_color) )) fig.add_trace(go.Scatter( x=plot_forecast_dates, y=plot_forecast_values, mode='lines', name=f'Predicted Future Close of {ticker}', line=dict(color=current_color, dash='dash') ))fig.update_layout( title=f'Stock Close Price vs XGBoost Prediction', xaxis_title='Date', yaxis_title='Price', template='plotly_white')fig.show()
Figure 7
Results
This section is an overview of the results from the previous analysis and forecasts. Because the markets are extremely volatile and many of the stocks, mostly notably ASML, have been skyrocketing in value lately, making forecasts is difficult, as some of the stocks have already been expected to fall according to most recent data.
From the first line chart Figure 1 and the market trend analysis you can see which companies have the strongest trends. ASML has been performing expectionally, but their stock value is experiencing a significant decrease. The other companies have similarly volatile stocks.
From the MACD Figure 3 and RSI Figure 4 indicator analysis it’s easy to see that the markets are very volatile. THe RSI plots vary heavily between oversold and overbought for most of the companies. This makes stock market analysis especially difficult and markets heavily exposed to speculation. One of the most ‘stable’ markets is that of AWEVF, partly due to it being a new company.
From the GARCH model Figure 5 and related statistical analysis, you can determine the most importand trends and qualities when it comes to market volatility. In this case, the most important plots to look at are the ‘estimated vs fixed volatility’ and forecast plots (remember to scroll to left to see the forecast). From these plots you can make decisions for risk management, banking regulations, and derivative management.
The statistical checks are somewhat optional but measures such as skewness from Figure 6 can provide significant details about the stock markets.
Finally the project the time series forecast model, XGboost Figure 7. As you can see from the graphs, the xgboost gives quite realistic forecast on the market close values. However for some of the tickers, you can see how the XGboost model might be too primitive and produce unrealistic forecasts.
The goal of this project has been to provide a wide variety of tools and models for stock market analysis and forecasting that can be applied to trading, investing, portfolio management, etc. The models should not be treated as firmly accurate, but instead as experimental models.